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Abstract. Many networks are important because they are substrates for 
dynamical systems, and their pattern of functional connectivity can itself 
be dynamic — they can functionally reorganize, even if their underlying 
anatomical structure remains fixed. However, the recent rapid progress 
in discovering the community structure of networks has overwhelmingly 
focused on that constant anatomical connectivity. In this paper, we lay 
out the problem of discovering functional communities, and describe an 
approach to doing so. This method combines recent work on measuring 
information sharing across stochastic networks with an existing and suc- 
cessful community- discovery algorithm for weighted networks. We illus- 
trate it with an application to a large biophysical model of the transition 
from beta to gamma rhythms in the hippocampus. 



1 Introduction 

The community discovery problem for networks is that of splitting a graph, 
representing a group of interacting processes or entities, into sub-graphs (com- 
munities) which are somehow modular, so that the nodes belonging to a given 
sub-graph interact with the other members more strongly than they do with 
the rest of the network. As the word "community" indicates, the problem has 
its roots in the study of social structure and cohesion [1, 2, 3], but is related to 
both general issues of clustering in statistical data mining [4] and to the systems- 
analysis problem of decomposing large systems into weakly-coupled sub-systems 
[5,6,7]. 

The work of Newman and Girvan [8] has inspired a great deal of research 
on statistical-mechanical approaches to community detection in complex net- 
works. (For a recent partial review, see [9].) To date, however, this tradition 
has implicitly assumed that the network is defined by persistent, if not static, 
connections between nodes, whether through concrete physical channels (e.g., 
electrical power grids, nerve fibers in the brain), or through enduring, settled 
patterns of interaction (e.g., friendship and collaboration networks). However, 
networks can also be defined through coordinated behavior, and the associated 
sharing of dynamical information; neuroscience distinguishes these as, respec- 
tively, "anatomical" and "functional" connectivity [10, 11]. The two sorts of 



connectivity do not map neatly onto each other, and it would be odd if func- 
tional modules always lined up with anatomical ones. Indeed, the same system 
could have many different sets of functional communities in different dynamical 
regimes. For an extreme case, consider globally-coupled map lattices [12], which 
are important statistical-mechanical models of physical and biological pattern 
formation [13]. In these systems, the "anatomical" network is fully connected, 
so there is only a single (trivial) community. Nonetheless, in some dynamical 
regimes they spontaneously develop many functional communities, i.e., groups 
of nodes which are internally coherent but with low inter-group coordination 
[14]- 3 

Coupled map lattices are mathematical models, but the distinction between 
anatomical and functional communities is not merely a conceptual possibility. 
Observation of neuronal networks in vivo show that it is fairly common for, e.g., 
central pattern generators to change their functional organization considerably, 
depending on which pattern they are generating, while maintaining a constant 
anatomy [15]. Similarly, neuropsychological evidence has long suggested that 
there is no one-to-one mapping between higher cognitive functions and special- 
ized cortical modules, but rather that the latter participate in multiple functions 
and vice versa, re-organizing depending on the task situation [16]. Details of 
this picture, of specialized anatomical regions supporting multiple patterns of 
functional connectivity, have more recently been filled in by brain imaging stud- 
ies [11]. Similar principles are thought to govern the immune response, cellular 
signaling, and other forms of biological information processing [17]. Thus, in an- 
alyzing these biological networks, it would be highly desirable to have a way of 
detecting functional communities, rather than just anatomical ones. Similarly, 
while much of the work on social network organization concerns itself with the 
persistent ties which are analogous to anatomy, it seems very likely [18, 19] that 
these communities cut in complicated ways across the functional ones defined by 
behavioral coordination [20, 21] or information flow [22]. This is perhaps particu- 
larly true of modern societies, which are thought, on several grounds [23, 24, 19] 
to be more flexibly organized than traditional ones. 

In this paper, we propose a two-part method to discover functional commu- 
nities in network dynamical systems. Section 2.1 describes the first part, which is 
to calculate, across the whole of the network, an appropriate measure of behav- 
ioral coordination or information sharing; we argue that informational coherence, 
introduced in our prior work [25], provides such a measure. Section 2.2 describes 
the other half of our method, using our measure of coordination in place of a 
traditional adjacency matrix in a suitable community-discovery algorithm. Here 
we employ the Potts model procedure proposed by Reichardt and Bornholdt 
[26, 27]. Section 2.3 summarizes the method and clarifies the meaning of the 
functional communities it finds. Sections 3 and 4 apply our method to a detailed 
biophysical model of collective oscillations in the hippocampus [28], where it 
allows us to detect the functional re-organization accompanying the transition 



3 We are preparing a separate paper on functional communities in coupled map lat- 
tices. 



from gamma to beta rhythms. Finally, Sect. 5 discusses the limitations of our 
method and its relations to other approaches (Sect. 5.1) and some issues for 
future work (Sect. 5.2). 

2 Discovering Behavioral Communities 

There are two parts to our method for finding functional communities. We first 
calculate a measure of the behavioral coordination between all pairs of nodes in 
the network: here, the informational coherence introduced in [25]. We then feed 
the resulting matrix into a suitable community-discovery algorithm, in place of 
the usual representation of a network by its adjacency matrix. Here, we have 
used the Rcichardt-Bornholdt algorithm [26] , owing to its Hamiltonian form, its 
ability to handle weighted networks, and its close connection to modularity. 

2.1 Informational Coherence 

We introduced informational coherence in [25] to measure the degree to which 
the behavior of two systems is coordinated, i.e., how much dynamically-relevant 
information they share. Because of its centrality to our method, we briefly reca- 
pitulate the argument of that paper. 

The starting point is the strong notion of "state" employed in physics and 
dynamical systems theory: the state of the system is a variable which determines 
the distribution of all present and future observables. In inferential terms, the 
state is a minimal sufficient statistic for predicting future observations [29], and 
can be formally constructed as measure-valued process giving the distribution 
of future events conditional on the history of the process. As a consequence, the 
state always evolves according to a homogeneous Markov process [30, 29] . 

In a dynamical network, each node i has an associated time-series of obser- 
vations Xi(t). This is in turn generated by a Markovian state process, Si(t), 
which forms its optimal nonlinear predictor. For any two nodes i and j, the 
informational coherence is 



where I[Sf,Sj] is the mutual information shared by Si and Sj, and H[Si] is 
the self-information (Shannon entropy) of Since I[Si\ Sj] < min H [Si], H [Sj], 
this is a symmetric quantity, normalized to lie between and 1 inclusive. The 
construction of the predictive states ensures that Si (t) encapsulates all informa- 
tion in the past of Xi(t) which is relevant to its future, so a positive value for 
I[Si]Sj] means that Sj(t) contains information about the future of Xi(t). That 
is, a positive value of I[Si;Sj] is equivalent to the sharing of dynamically rele- 
vant information between the nodes, manifesting itself as coordinated behavior 
on the part of nodes i and j. 

Clearly, a crucial step in calculating informational coherence is going from 
the observational time series Xi(t) to the predictive state series Si(t). In certain 
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cases with completely specified probability models, this can be done analyti- 
cally [29, 31]. In general, however, we are forced to reconstruct the appropriate 
state-space structure from the time series itself. State reconstruction for deter- 
ministic systems is based on the Takens embedding theorem, and is now routine 
[32]. However, biological and social systems are hardly ever deterministic at 
experimentally-accessible levels of resolution, so we need a stochastic state re- 
construction algorithm. Several exist; we use the CSSR algorithm introduced in 
[33], since, so far as we know, it is currently the only stochastic state reconstruc- 
tion algorithm which has been proved statistically consistent (for conditionally 
stationary discrete sequences). We briefly describe CSSR in Appendix A. 

Informational coherence is not, of course, the only possible way of measuring 
behavioral coordination, or functional connectivity. However, it has a number 
of advantages over rival measures [25]. Unlike measures of strict synchroniza- 
tion, which insist on units doing exactly the same thing at exactly the same 
time, it accommodates phase lags, phase locking, chaotic synchronization, etc., 
in a straightforward and uniform manner. Unlike cross-covariance, or the re- 
lated spectral coherence, it easily handles nonlinear dependencies, and does not 
require the choice of a particular lag (or frequency, for spectral coherence), be- 
cause the predictive states summarize the entire relevant portion of the history. 
Generalized synchrony measures [34] can handle nonlinear relationships among 
states, but inappropriately assume determinism. Finally, mutual information 
among the observables, I[Xi;Xj], can handle nonlinear, stochastic dependen- 
cies, but suffers, especially in neural systems, because what we really want to 
detect are coordinated patterns of behavior, rather than coordinated instanta- 
neous actions. Because each predictive state corresponds to a unique statistical 
pattern of behavior, mutual information among these states is the most natural 
way to capture functional connectivity. 

2.2 The Reichardt-Bornholdt Community Discovery Algorithm 

The Reichardt-Bornholdt [26, 27] community discovery algorithm finds groups 
of nodes that are densely coupled to one another, but only weakly coupled to 
the rest of the network, by establishing a (fictitious) spin system on the network, 
with a Hamiltonian with precisely the desired properties, and then minimizing 
the Hamiltonian through simulated annealing. More concretely, every node i is 
assigned a "spin" <7j , which is a discrete variable taking an integer value from 1 to 
a user-defined q. A "community" or "module" will consist of all the nodes with a 
common spin value. The spin Hamiltonian combines a ferromagnetic term, which 
favors linked nodes taking the same spin (i.e., being in the same community), and 
an anti-ferromagnetic term, which favors non-linked nodes taking different spins 
(i.e., being in different community). Both interactions are of the Potts model 
type, i.e., they are invariant under permutations of the integers labeling the 
clusters. After some algebraic manipulation [27], one arrives at the Hamiltonian 
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where Aij is the adjacency matrix, S(-, •) is the Kronecker delta function, pij is a 
matrix of non-negative constants giving the relative weights of different possible 
links, and 7 gives the relative contribution of link absence to link presence. 
The choice of Pij is actually fairly unconstrained, but previous experience with 
community discovery suggests that very good results are obtained by optimizing 
the Newman modularity Q [35] 

i,j 

where h is the degree of node i, and 2M = J2i k the total number of links. 
Essentially, Newman's Q counts the number of edges within communities, minus 
the number which would be expected in a randomized graph where each node 
preserved its actual degree [9], and a% were IID uniform. Setting p^ = kikj/2M 
and 7 = 1, we see that TC(a) and — Q(cr) differ only by a term (the diagonal 
part of the sum for Q) which does not depend on the assignment of nodes to 
communities. Thus, minimizing TC(a) is the same as maximizing the modularity. 
Varying 7, in this scheme, effectively controls the trade-off between having many 
small communities and a few large ones [27], and makes it possible to discover 
a hierarchical community structure, which will be the subject of future work. 

While this procedure was originally developed for the case where is a 
0-1 adjacency matrix, it also works perfectly well when links take on (positive) 
real- valued strengths. In particular, using A.^ = ICij, we can still maximize 
the modularity, taking the "degree" of node i to be ki — ^2 j ICij [27]. The 
interpretation of the modularity is now the difference between the strength of 
intra-community links, and a randomized model where each node shares its link 
strength indifferently with members of its own and other communities. 

2.3 Summary of the Method 

Let us briefly summarize the method for discovering functional communities. We 
begin with a network, consisting of N nodes. For each node, we have a discrete- 
value, discrete-time ("symbolic") time series, {xi(t)}, recorded simultaneously 
over all nodes. The CSSR algorithm is applied to each node's series separately, 
producing a set of predictive states for that node, and a time series of those 
states, {sj(t)}. We then calculate the complete set of pair wise informational 
coherence values, {ICij}, using Eq. 1. This matrix is fed into the Reichardt- 
Bornholdt procedure, with A^ = ICij, which finds an assignment of spins to 
nodes, {(Ji\, minimizing the Hamiltonian in Eq. 2. The functional communities 
of the dynamical network consist of groups of nodes with common spin values. 
Within each community, the average pairwise coherence of the nodes is strictly 
greater than would be expected from a randomizing null model (as described 
in the previous paragraph). Furthermore, between any two communities, the 
average pairwise coherence of their nodes is strictly less than expected from 
randomization [27]. 



3 Test on a Model System of Known Structure: Collective 
Oscillations in the Hippocampus 

We use simulated data as a test case, to validate the general idea of our method, 
because it allows us to work with a substantial network where we nonetheless 
have a strong idea of what appropriate results should be. Because of our ulti- 
mate concern with the functional re-organization of the brain, we employed a 
large, biophysically-detailed neuronal network model, with over 1000 simulated 
neurons. 

The model, taken from [28], was originally designed to study episodes of 
gamma (30-80Hz) and beta (12-30Hz) oscillations in the mammalian nervous 
system, which often occur successively with a spontaneous transition between 
them. More concretely, the rhythms studied were those displayed by in vitro 
hippocampal (CA1) slice preparations and by in vivo neocortical EEGs. 

The model contains two neuron populations: excitatory (AMPA) pyramidal 
neurons and inhibitory (GABA^) interneurons, defined by conductance-based 
Hodgkin-Huxley-style equations. Simulations were carried out in a network of 
1000 pyramidal cells and 300 interneurons. Each cell was modeled as a one- 
compartment neuron with all-to-all coupling, endowed with the basic sodium 
and potassium spiking currents, an external applied current, and some Gaussian 
input noise. The anatomical, synaptic connections were organized into blocks, 
as shown in Fig. 2. 

The first 10 seconds of the simulation correspond to the gamma rhythm, 
in which only a group of neurons is made to spike via a linearly increasing 
applied current. The beta rhythm (subsequent 10 seconds) is obtained by acti- 
vating pyramidal-pyramidal recurrent connections (potentiated by Hcbbian pre- 
processing as a result of synchrony during the gamma rhythm) and a slow out- 
ward after-hypcr-polarization (AHP) current (the M-current), suppressed during 
gamma due to the metabotropic activation used in the generation of the rhythm. 
During the beta rhythm, pyramidal cells, silent during gamma rhythm, fire on 
a subset of interneurons cycles (Fig. 1). 

4 Results on the Model 

A simple heat-map display of the informational coherence (Fig. 3) shows little 
structure among the active neurons in cither regime. However, visual inspection 
of the rastergrams (Fig. 1) leads us to suspect the presence of two very large 
functional communities: one, centered on the inhibitory interneurons and the 
excitatory pyramidal neurons most tightly coupled to them, and another of the 
more peripheral excitatory neurons. During the switch from the gamma to the 
beta rhythm, we expect these groups to re-organize. 

These expectations are abundantly fulfilled (Fig. 4) . We identified communi- 
ties by running the Reichardt-Bornholdt algorithm with the maximum number 
of communities (spin states) set to 25, the modularity Hamiltonian, and 7 = 1. 
(Results were basically unchanged at 40 or 100 spin values.) In both regimes, 
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Fig. 1. Rastergrams of neuronal spike-times in the network. Excitatory, pyramidal 
neurons (numbers 1 to 1000) are green, inhibitory interneurons (numbers 1001 to 1300) 
are red. During the first 10 seconds (a), the current connections among the pyramidal 
cells are suppressed and a gamma rhythm emerges (left). At t = 10s, those connections 
become active, leading to a beta rhythm (b, right). 
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Fig. 2. Schematic depiction of the anatomical network. Here nodes represent popu- 
lations of cells: excitatory pyramidal neurons (triangles labeled E) or inhibitory in- 
terneurons (large circle labeled I) . Excitatory connections terminate in bars, inhibitory 
connections in filled circles. During the gamma rhythm (a), the pyramidal neurons 
are coupled to each other only indirectly, via the interneurons, and dynamical effects 
separate the pyramidal population into participating (Ep) and suppressed (Es) sub- 
populations. During the beta rhythm (6), direct connections among the Ep neurons, 
built up, but not activated, by Hebbian learning under the gamma rhythm are turned 
on, and the connection from the Es neurons to the interneurons are weakened by the 
same Hebbian process (dashed line). 



there arc two overwhelmingly large communities, containing almost all of the 
neurons which actually fired, and a handful of single-neuron communities. The 
significant change, visible in the figure, is in the organization of these communi- 
ties. 

During the gamma rhythm, the 300 interneurons form the core of the larger of 
these two communities, which also contains 199 pyramidal neurons. Another 430 
pyramidal neurons belong to a second community. A final 5 pyramidal cells are in 
single-neuron communities; the rest do not fire at all. A hierarchical analysis (not 
shown) has the two large communities merging into a single super-community. 
The regular alternation of the two communities among the pyramidal neurons, 
evident in Fig. 4a, is due to the fact that the external current driving the pyra- 
midal neurons is not spatially uniform. 

With the switch to the beta rhythm, the communities grow and re-organize. 
The community centered on the interneurons expands, to 733 neurons, largely 
by incorporating many low-index pyramidal neurons which had formerly been 
silent, and are now somewhat erratically synchronized, into its periphery. Inter- 
estingly, many of the latter are only weakly coherent with any one interneuron 
(as can be seen by comparing Figs. 36 and 46). What is decisive is rather their 
stronger over-all pattern of coordination with the interneurons, shown by sharing 
a common (approximate) firing period, which is half that of the high-index pyra- 
midal cells (Fig. 16). Similarly, the other large community, consisting exclusively 




Fig. 3. Heat-maps of coordination across neurons in the network, measured by in- 
formational coherence. Colors run from red (no coordination) through yellow to pale 
cream (maximum). 



of pyramidal neurons, also grows (to 518 members), again by expanding into the 
low-index part of the network; there is also considerable exchange of high-index 
pyramidal cells between the two communities. Finally, nine low-index neurons, 
which fire only sporadically, belong in clusters of one or two cells. 

5 Discussion and Conclusion 

5.1 Limitations and Related Approaches 

Our method is distinguished from earlier work on functional connectivity primar- 
ily by our strong notion of functional community or module, and secondarily by 
our measure of functional connectivity. Previous approaches to functional con- 
nectivity (reviewed in [10, 11]) either have no concept of functional cluster, or 
use simple agglomerative clustering [4]; their clusters are just groups of nodes 
with pairwise-similar behavior. We avoid agglomerative clustering for the same 
reason it is no longer used to find anatomical communities: it is insensitive to 
the global pattern of connectivity, and fails to divide the network into coher- 
ent components. Recall (Sect. 2.3) that every functional community we find 
has more intra-cluster information sharing than is expected by chance, and less 
inter-cluster information sharing. This is a plausible formalization of the intuitive 
notion of "module", but agglomeration will not, generally, deliver it. 

As for using informational coherence to measure functional connectivity, we 
discussed its advantages over other measures in Sect. 2.1 above, and at more 
length in [25] . Previous work on functional connectivity has mostly used surface 
features to gauge connectivity, such as mutual information between observables. 
(Some of the literature on clustering general time series, e.g. [36, 37, 38], uses 
hidden Markov models to extract latent features, but in a mixture-model frame- 
work very different from our approach.) The strength of informational coherence 
is that it is a domain-neutral measure of nonlinear, stochastic coordination; its 
weakness is that it requires us to know the temporal sequence of predictive states 
of all nodes in the network. 

This need to know the predictive states of each node is the major limitation 
of our method. For some mathematical models, these states are analytically 
calculable, but in most cases they must be learned from discrete-value, discrete- 
time ("symbolic") time series. Those series must be fairly long; exactly how long 
is an on-going topic of investigation 4 , but, empirically, good results are rare with 
less than a few thousand time steps. Similarly, reliable estimates of the mutual 
information and informational coherence also require long time series. 

Predictive states can be mathematically defined for continuous- value, continuous- 
time systems [30], but all current algorithms for discovering them, not just CSSR, 
require symbolic time series. (Devising a state-reconstruction procedure for con- 
tinuous systems is another topic of ongoing research.) Spike trains, like e-mail 
networks [22], are naturally discrete, so this is not an issue for them, but in most 

4 CSSR converges on the true predictive states (see the appendix), but the rate of 
convergence is not yet known. 
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Fig. 4. Division of the network into functional communities. Black points denote pairs 
of nodes which are both members of a given community. During the gamma rhythm 
(a) the interneurons (numbered 1001 to 1300) form the core of a single community, 
along with some of the active pyramidal neurons; because of the spatially modulated 
input received by the latter, however, some of them belong to another community. Dur- 
ing beta rhythm (6), the communities re-organize, and in particular formerly inactive 
pyramidal neurons are recruited into the community centered on the interneurons, as 
suggested by the rastergrams. 



other cases we need to find a good symbolic partition first, which is non-trivial 
[39]. The need for long symbolic time series may be especially difficult to meet 
with social networks. 

5.2 Directions for Future Work 

Our results on the model of hippocampal rhythms, described in the previous sec- 
tion, arc quite promising: our algorithm discovers functional communities whose 
organization and properties make sense, given the underlying micro-dynamics of 
the model. This suggests that it is worthwhile to apply the method to systems 
where we lack good background knowledge of the functional modules. With- 
out prc-judging the results of those investigations, however, we would like to 
highlight some issues for future work. 

1. Our method needs the full matrix of informational coherences, which is an 
0(N 2 ) computation for a network of size N. If we are interested in the organiza- 
tion of only part of the network, can we avoid this by defining a local community 
structure, as was done for anatomical connectivity by [40]? Alternatively, if we 
know the anatomical connectivity, can we restrict ourselves to calculating the 
informational coherence between nodes which are anatomically tied? Doing so 
with our model system led to basically the same results (not shown), which 
is promising; but in many real-world systems the anatomical network is itself 
uncertain. 

2. The modularity Hamiltonian of Sect. 2.2 measures how much information 
each node shares with other members of its community on a pairwise basis. 
However, some of this information could be redundant across pairs. It might be 
better, then, to replace the sum over pairs with a higher-order coherence. The 
necessary higher-order mutual informations are easily defined [10, 41, 42], but 
the number of measurements needed to estimate them from data grows expo- 
nentially with the number of nodes. However, it may be possible to approximate 
them using the same Chow-Liu bounds employed by [25] to estimate the global 
coherence. 

3. It would be good if our algorithm did not simply report a community 
structure, but also assessed the likelihood of the same degree of modularity 
arising through chance, i.e., a significance level. For anatomical communities, 
Guimera et al. [43] exploit the spin-system analogy to show that random graph 
processes without community structure will nonetheless often produce networks 
with non-zero modularity, and (in effect) calculate the sampling distribution of 
Newman's Q using both Erdos-Renyi and scale-free networks as null models. 
(See however [44] for corrections to their calculations.) To do something like 
this with our algorithm, we would need a null model of functional communities. 
The natural null model of functional connectivity is simply for the dynamics at 
all nodes to be independent, and (because the states are Markovian) it is easy 
to simulate from this null model and then bootstrap p-values. We do not yet, 
however, have a class of dynamical models where there nodes share information, 
but do so in a completely distributed, a-modular way. 



4. A variant of the predictive-state analysis that underlies informational co- 
herence is able to identify coherent structures produced by spatiotemporal dy- 
namics [45]. Moreover, these techniques can be adapted to network dynamics, 
if the anatomical connections are known. This raises numerous questions. Are 
functional communities also coherent structures? Are coherent structures in net- 
works [46] necessarily functional communities? Can the higher-order interactions 
of coherent structures in regular spatial systems be ported to networks, and, if 
so, could functional re-organization be described as a dynamical process at this 
level? 

5.3 Conclusion 

Network dynamical systems have both anatomical connections, due to persis- 
tent physical couplings, and functional ones, due to coordinated behavior. These 
are related, but logically distinct. There are now many methods for using a 
network's anatomical connectivity to decompose it into highly modular com- 
munities, and some understanding of these methods' statistical and statistical- 
mechanical properties. The parallel problem, of using the pattern of functional 
connectivity to find functional communities, has scarcely been explored. It is 
in many ways a harder problem, because measuring functional connectivity is 
harder, and because the community organization is itself variable, and this vari- 
ation is often more interesting than the value at any one time. 

In this paper, we have introduced a method of discovering functional modules 
in stochastic dynamical networks. We use informational coherence to measure 
functional connectivity, and combine this with a modification of the Potts-model 
community-detection procedure. Our method gives good results on a biophysi- 
cal model of hippocampal rhythms. It divides the network into two functional 
communities, one of them based on the inhibitory interneurons, the other con- 
sisting exclusively of excitatory pyramidal cells. The two communities change in 
relative size and re-organize during the switch from gamma to beta rhythm, in 
ways which make sense in light of the underlying model dynamics. While there 
are theoretical issues to explore, our success on a non-trivial simulated network 
leads us to hope that we have found a general method for discovering functional 
communities in dynamic networks. 

Acknowledgments 

Thanks to L. A. N. Amaral, S. Bornholdt, A. Clauset, R. Haslinger, C. Moore and 
M. E. J. Newman for discussing community discovery and/or network dynamics; 
to J. Reichardt for valuable assistance with the implementation of his algorithm 
and general suggestions; and to the editors for their patience. 

A The CSSR Algorithm 

This appendix briefly describes the CSSR algorithm we use to reconstruct the 
effective internal states of each node in the network. For details, see [33] ; for an 



open-source C++ implementation, see http://bactra.org/CSSR/. For recent 
applications of the algorithm to problems in crystallography, anomaly detection 
and natural language processing, see [47, 48, 49, 50, 51]. 

We wish to predict a dynamical system or stochastic process {X t }. By X l a 
we will denote the whole trajectory of the process from time s to time t, inclu- 
sive, by X^ the whole "past" or "history" of the process through time t, and 
by X^ its "future" , its trajectory at times strictly greater than t. The "state" 
of {X t } at time t is a variable, St, which fixes the distribution of all present or 
future observations, i.e., the distribution of X + (t) [29, 31]. As such, the state 
is a minimal sufficient statistic for predicting the future of the process. Suffi- 
ciency is equivalent to the requirement that I[X^; X^~] = I[X^; St], where /[•;•] 
is the mutual information [52]. In general, St = e(Xf), for some measurable 
functional e(-) of the whole past history of the process up to and including time 
t. If {Xt] is Markovian, then e is a function only of X tl but in general the state 
will incorporate some history or memory effects. Each state, i.e., possible value 
of e, corresponds to a predictive distribution over future events, and equally 
to an equivalence class of histories, all of which lead to that conditional distri- 
bution over future events. State-reconstruction algorithms use sample paths of 
the process to find approximations e to the true minimal sufficient statistic e, 
and ideally the approximations converge, at least in probability. The CSSR al- 
gorithm [33] does so, for discrete- valued, discrete-time, conditionally-stationary 
processes. 

CSSR is based on the following result about predictive sufficiency [29, pp. 842- 
843]. Suppose that e is next-step sufficient, i.e., I[X t+1 :X^] — I[X t +i;e(X^~)], 
and that it can be updated recursively: for some measurable function T, e(X^ +1 ) = 
T(e(X^~, X t +i). Then e is predictively sufficient for the whole future of the pro- 
cess — intuitively, the recursive updating lets us chain together accurate next- 
step predictions to go as far into the future as we like. CSSR approximates e 
by treating it as a partition, or set of equivalence classes, over histories, and 
finding the coarsest partition which meets both of the conditions of this the- 
orem. Computationally, CSSR represents states as sets of suffixes, so a history 
belongs to a state (equivalence class) if it terminates in one of the suffixes in 
that state's representation. That is, a history, x^, will belong to the class C, 

G C, if x t t _^ +1 = c, for some suffix c assigned to C, where |c| is the length 
of the suffix. 5 

In the first stage, CSSR tries to find a partition of histories which is sufficient 
for next-step prediction. It begins with the trivial partition, in which all histories 
belong to the same equivalence class, defined by the null suffix (corresponding to 
an IID process), and then successively tests whether longer and longer suffices 
give rise to the same conditional distribution for the next observation which dif- 
fer significantly from the class to which they currently belong. That is, for each 
class C, suffix c in that class, and possible observable value a, it tests whether 

Pr (X t+1 \Xf £ C) differs fromPr |+1 = c,X t _\ c \ = a). (We use stan- 



5 The algorithm ensures that there are never overlapping suffixes in distinct states. 



dard tests for discrepancy between sampled distributions.) If an extended, child 
suffix (ac) does not match its current classes, the parent suffix (c) is deleted from 
its class (C), and CSSR checks whether the child matches any existing class; if 
so it is re-assigned to the closest one, and the partition is modified accordingly. 
Only if a suffix's conditional distribution (Pr (^X t+ i\Xj_ l = ac)) differs signifi- 
cantly from all existing classes does it get its own new cell in the partition. 

The result of this stage is a partition of histories (i.e., a statistic) which is close 
to being next-step sufficient, the sense of "close" depending on the significance 
test. In the second stage, CSSR iteratively refines this partition until it can be 
recursively updated. This can always be done, though it is potentially the most 
time-consuming part of the algorithm 6 . The output of CSSR, then, is a set of 
states which make good next-step predictions and can be updated recursively, 
and a statistic e mapping histories to these states. 

If the true number of predictive states is finite, and some mild technical 
assumptions hold [33] , a large deviations argument shows that Pr (e ^ e) — » 
as the sample size n — > oo. That is, CSSR will converge on the minimal suffi- 
cient statistic for the data-generating process, even though it lacks an explicit 
minimization step. Furthermore, once the right statistic has been discovered, the 
expected L\ (total variation) distance between the actual predictive distribution, 
Pr (AT t + |e(X t - )) and that forecast by the reconstructed states, Pr (X t - |e(X t ~)), 
goes to zero with rate 0(n~ 1/>2 ), which is the same rate as for IID data. The 
time complexity of the algorithm is at worst 0{n) + C>(fc 2L+1 ), where k is the 
number of discrete values possible for X t , and L is the maximum length of suf- 
fices considered in the reconstruction. Empirically, average-case time complexity 
is much better than this. 
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